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Abstract 

This paper presents a systematic approach for implementing a class of nonlinear signal processing systems 
as a distributed web service, which in turn is used to solve optimization problems in a distributed, asynchronous 
fashion. As opposed to requiring a specialized server, the presented approach requires only the use of a commodity 
database back-end as a central resource, as might typically be used to serve data for websites having large numbers 
of concurrent users. In this sense the presented approach leverages not only the scalability and robustness of various 
database systems in sharing variables asynchronously between workers, but also critically it leverages the tools 
of signal processing in determining how the optimization algorithm might be organized and distributed among 
various heterogeneous workers. A publicly-accessible implementation is also presented, utilizing Firebase as a back¬ 
end server, and illustrating the use of the presented approach in solving various optimization problems commonly 
arising in the context of signal processing. 
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I. Introduction 


In designing and implementing signal processing systems, a general implementation strategy is to (I) begin with 
a set of desired equations to be satisfied, (2) represent these equations as a graphical structure, (3) distribute state 
throughout the graph, e.g. introducing scalar or vector delay elements, and (4) determine a protocol for exchanging 
state, resulting in an algorithm or iteration satisfying the original equations. Various specific methods consistent 
with this general approach have been described formally, e.g. in [1], [2]. 

Consistent with these steps, the issue of distributing state is perhaps the most central issue in effectively 
distributing algorithms in general, including distributing algorithms across multiple heterogeneous processing nodes. 
As has been discussed in [3], this observation suggests opportunity in utilizing the general strategy of specifying 
algorithms first using a declarative language, which after determining a protocol for distributing and exchanging 
state, would be decomposed as an ensemble of distributed programs and implemented on processing nodes using 
imperative languages. 

The formal approaches used in implementing signal processing systems form a broad and concrete class of 
examples that are consistent with this general strategy, with state-free signal-flow diagrams being a declarative 
representation, and with an eventual arrangement of run-loops being imperative. Drawing upon this, the results 
outlined in [4]-[6] describe a straightforward method for implementing a variety of optimization algorithms by 
casting them as signal processing systems, in turn leveraging the various common associated implementation 
strategies in distributing and transferring state. 

The intent of this paper is to describe a distributed web service for solving optimization problems that results 
as a consequence of the way of thinking described in [4]-[6]. The service is freely accessible online as part of 
the general site “Signal Processing Conservation” [7], which provides a general overview and examples of the use 
of conservation principles in signal processing systems, importantly also describing the mathematical foundation 
underlying [4]-[6]. The portion of the site containing the web service for optimization discussed in this paper is 
available at http://optimization.spconservation.org, which we refer to herein as “0-SPC”. 

The architecture of 0-SPC in particular is built on Firebase [8] and utilizes the service primarily as a high- 
performance back-end for asynchronous representational state transfer between browser-based clients, e.g. as 
opposed to as a centralized resource for coordinating data processing as with [9]. In this sense, 0-SPC represents 
an example of how the thinking described in [4]-[6] can be used to create a performant system operating in the 
somewhat extreme case where numerical computation is distributed entirely to the extremities of the graph. The 
considerations described in this paper would similarly apply to the creation of a web-based optimization service 
utilizing an alternative key-value store system, e.g. MongoDB [10] or Redis [11], or any number of relational 
database systems. In each of these cases, the performance of the distributed system would be able to draw upon 
the particular strengths of the data store being utilized. 

We begin in Section II by specifying the targeted class of signal processing systems and reviewing their utility as 
optimization algorithms. In Section III we focus on general considerations regarding their distributed implementation 
as a web service, consistent with the architecture of 0-SPC. In Section IV, we collate the numerical experiments 
referenced throughout and provide concluding remarks. 
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Fig. 1. An illustration of (a) the primal optimization problem, (b) the associated stationarity conditions derived from [4], and (c) the 
transformed stationarity conditions in [5]. (d) The implementation of (2) utilized in [12] obtained by distributing state to form an algorithm, 
(e) The compute architecture leveraged by O-SPC by distributing state via a database: a client populates the database with a problem instance 
and then uncoordinated workers asynchronously implement the conditions in (2). 


























II. Signal processing systems and optimization 


The general framework presented in [5] facilitates the construction of distrihuted, asynchronous signal processing 
systems for solving optimization problems hy analyzing the structure of the optimization problem itself and 
without relying on any existing non-distributed and/or synchronous methods. Therefore using this framework, 
signal processing systems, and by extension optimization algorithms, may be generated that might not be readily 
derived by conventional techniques. In the remainder of this section we briefly review the key steps in casting 
optimization algorithms as signal processing systems. 

A conservative signal processing system is one for which the variables available for interconnection between 
subsystems admit an organization adhering to an indefinite quadratic form of a particular class that is invariant to 
the evolution of the system [4]. The utility of conservation principles in [5] is twofold: (1) in defining the primal 
optimization problem in Fig. 1(a) and its dual so that the joint feasibility conditions depicted in Fig. 1(b) serve as 
sufficient conditions for stationarity, and (2) in transforming said conditions into the algebraic form illustrated in 
Fig. 1(c) where R : —)• and H: R^“^ —>• are orthogonal matrices, m: R^ —)• R^^ is a generally 

nonlinear map, and e G R'^“^ is a system bias. The maps m and H as well as the bias e are associated with the 
set constraints Ak and objective functionals Qk in Fig. 1(a) defined on the decision variables au while R is given 
by 
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where A represents the aggregate linear equality constraints involving only the primal decision variables. Without 
loss of generality, the system in Fig. 1(c) may be recast into an equivalent system, in the sense that a solution to 
one yields a solution to both, of the form 

c* = m{d*) and d* = Gc* + / (2) 

where c*,d* G R^ denote a solution, G : R^ —)■ R^ is an orthogonal matrix, and / G R^ is a system bias. 
Figure 1(d) illustrates the reduced system (2) utilized in [12] where the algebraic loops have been broken by 
inserting state/memory into the system. 

The precompute required to assemble a signal processing system of the presented class is analytic and involves 
purely linear operations. In particular, aside from the computation of R in (1), the algebraic reduction of {R,H,e) 
to (G, /) corresponds to identifying the intersection of affine subspaces and thus can be expressed in closed form. 
The postcompute associated with recovering the solution to the optimization problem given a solution (c*,d*) to 
(2) is also linear. For example, let aj denote a primal decision variable and assume the precompute retains the 
system variables Cj and dj associated with a^ . Then, the value at a stationary point a* of the problem is 

Si = \(£,+c’^) or = (3) 

depending on whether is an input to or output from A, respectively. 

In the context of numerically solving (2) by generating state sequences 
iterative solver as a system implementation in which the processing directly yields the next state values and an 
incremental solver as one in which the processing yields values to be added to the current state in order to produce 
the next. We refer to either solver as being filtered when additional processing is used to produce the next state 
value as an affine combination of the current state value and the state value produced by the unfiltered solver. A 
sufficient condition under which the state sequences converge to a solution (c*,d*) of (2) that encompasses the 
numerical examples presented in this paper (provided that we appropriately implement the filtered solvers) is that 
the nonlinear map m be non-expansive, i.e. m must satisfy 

VujUGR-^, \\rri{v) — rn{u )\\2 <\\y_ —'id \\2 ■ (4) 

Convergence is in particular in the Euclidean sense for synchronous implementations and in mean square for 
stochastic/asynchronous implementations; we refer to [13] for a complete treatment. For the purpose of illustration 
and not by limitation, we assume hereon that m is a coordinatewise nonlinearity, this assumption is true for all 
numerical examples in this paper. The handling of general nonlinearities follows in an analogous way. 
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"c" - storage of the variable C for quick read/write access 
"active" - a dynamically populated list of active worker resources 
"computeHist" - history of all worker contributions 
"controls" - global parameters and metadata for worker resources 
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Fig. 2. A qualitative description of the organization of the set of equations in (2) and the signals to he processed into a generic database. 
The procedures for worker initialization and processing associated with two distributed solvers are also provided. Screen captures from 
O-SPC illustrate the global controller and computed solution of a non-negative least squares problem obtained using 24 distributed workers 
implementing an iterative filtered solver with parameter p = 0.5. A breakdown of the workers computational platform allocation and 
individual contributions to the overall solution is also depicted. 


III. Implementation as a web service 

We now overview the operating principle behind O-SPC which we believe suggests opportunity in designing 
future systems in this way. Referring to the website content, several optimization problems frequently occurring in 
signal processing contexts have been assembled into an examples library, including those discussed in Section IV. 

In the remainder of this section we present the details associated with two distributed and four non-distributed 
solvers used to obtain a solution (c*,d*) to (2). We comment upfront, however, that the specific form of the solvers 
presented in this section differ from the implementations in O-SPC in that they have been adapted here for the 
purpose of clarity rather than computational efficiency. 

A. Distributed implementations 

A longstanding approach to efficiently solving a large class of numerical problems is to recast any problem of 
the class into a fixed representation to which a set of generic tools may be immediately applied. In this spirit, a 
signal processing system conforming to (2) is automatically synthesized once the parameters of a problem have 
been specified, from which several solvers corresponding to various distributions of state and processing instructions 
may be applied. 

Consistent with the general implementation strategy discussed in Section I, Fig. 2 illustrates the organization 
of the set of equations in (2) into a generic key-value store, e.g. a non-relational database, for implementation on 
the graph depicted in Fig. 1(e). The protocol for state transfer consists of workers asynchronously accessing the 
database to retrieve a subset of the computation and the associated signals to be processed, processing these signals, 
and asynchronously writing the result back into the database. This strategy represents a form of object-oriented 
signal processing wherein the objects contain data in the form of the signals to be processed and methods in the 
form of processing instructions. 

Screen captures from the O-SPC application interface are provided in Fig. 2 for a non-negative least squares 
problem, depicting the dashboard through which distributed workers can be controlled. Through the dashboard 
interface, metaparameters for the problem can be set, in turn generating a corresponding uniform resource locator 
(URL) through which workers can attach to the problem instance to perform computation. For worker devices with 
integrated cameras, a quick response (QR) code is also dynamically generated. The particular solution depicted in 
Fig. 2 was obtained using 24 distributed workers. Analytics regarding the computational platforms of the connected 
workers, as well as the individual contributions to the overall optimization progress, are provided via dynamically- 
generated graphs. 

It is worth noting that nearly any computational resource equipped with network access and a basic JavaScript 
engine may be utilized as a worker on O-SPC. For example, a heterogeneous set of workers might include modern 
web browsers on mobile, tablet and desktop machines as well as JavaScript enabled microcontrollers [14], [15]. 

The worker initialization and processing instructions for two distributed solvers are summarized in column 3 
of Fig. 2. Specifically, each worker, independent of any and all other workers, performs the following steps ad 
infinitum to implement an iterative filtered solver: 












































(1) generate a random integer j G {1,... ,K} corresponding to the state variables Cj and dj to be processed; 

(2) read the current state of the vector c as well as the object var j consisting of a characterization of the 
nonlinearity nij labeled m, the value of /. labeled f, and the row vector corresponding to the j'*' row 
of G labeled Grow; 

(3) generate the intermediary state value dj as 

dj ^ + • • • + (5) 

(4) generate the new state value Cj as 

Cj ^ pnij [dj) + (1 - p)cj (6) 

where the filtering parameter p is a metaparameter obtained during the worker initialization phase; 

(5) asynchronously write the new state value Cj into the j**' position of c in the database. 

For iterative solvers, the state variable d does not need to be explicitly stored in the database. Indeed, once the 
partial solution c* is identified, the state vector d* may be generated using (2) and thus the original optimization 
problem is effectively solved. Referring again to Fig. 2, the processing procedure for an iterative solver corresponds 
to modifying the instructions outlined above by setting p = 1 in (6). 

We call special attention to the fact that no attempt is made at the algorithm level to regulate global task 
allocation among the workers nor to enforce concurrency of any form. Specifically, fhe data requests and updates 
are respectively executed using asynchronous read and write operations with no concept of precedent or preference 
among the workers. For example, if multiple workers request data associated with the same state variable Cj and 
each experiences a different latency (and thus each possibly retrieves different state vectors c) then the database 
records the updates in the order they are received irrespective of the order of the read operations. 

Referring to Fig. 1(e), the database might simultaneously contain numerous active problem instances. Workers 
may be added or removed at any time (including changing problem instances) without any form of coordination 
since workers are never assigned responsibility for any particular part of the workload. In this sense, 0-SPC 
facilitates the time-varying allocation of compute resources in order to adaptively respond to real-time constraints, 
time-varying network congestion, and resource outages. Another advantage to utilizing the presented approach for 
solving optimization problems in practice is the ability to update the portion of the database (and by extension 
the signal processing structure as well) associated with measurements and/or observations as new data becomes 
available. The response of the system is then to transform the state of the database associated with the current solution 
toward the new fixed-point or invariant state corresponding to the new solution. Consequently, the distributed solvers 
summarized in Fig. 2 are sufficient for solving a broad class of optimization problems over delay or disruption 
tolerant networks and further do not rely critically upon the availability or synchronization of any particular compute 
resources. 

B. Non-distributed implementations 

The toolset in 0-SPC also provides support for four local or non-distributed solver types which organize and 
implement the associated signal processing system using a single JavaScript enabled web browser as the compute 
engine. We define an asynchronous implementafion protocol in this setting as one for which the behavior of the 
system state is that of coordinate-wise discrete-time sample-and-hold elements triggered by discrete-time Bernoulli 
processes. 

More formally, let denote a sequence of randomly generated subsets of AT} such that for 

every value of n each z€{l,...,Ar}is included in with probability p and not included with probability 1 — p 
independently and independent of n. Further, denote Z^ as the set compliment of i.e. = {z G {1,... , K}\ i 0 
Zn}, and let Ix^ denote the diagonal matrix with ones on the diagonal entries indicated by the index set X„ and 
zeros elsewhere. Then, the update procedure for the state sequence given by 

d^ = [Gm [d^-^) + /) + n G N, 


(7) 
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Fig. 3. The procedures for system initialization and processing associated with four non-distributed solvers. Screen captures from O-SPC 
illustrate the solution to a LASSO or basis pursuit denoising problem [16] obtained by running an incremental filtered solver with filter 
parameter p — 0.5 and p = 0.25. 


corresponds to the iterative solver for the signal processing system depicted in Fig. 1(d). Reorganizing the 
computation and modifying the initial conditions such that the first difference of the signals rather than the signals 
themselves are being processed results in the incremental solver processing procedure where ^ and 

(m , n € N. (8) 

The system initialization and processing procedure for the local solvers discussed hereto and their filtered 
counterparts are summarized in Fig. 3. In addition, screen captures from O-SPC illustrate the solution obtained 
to a sparse signal recovery problem wherein the signal processing system was implemented using the incremental 
filtered solver with p = 0.5 and p = 0.25. 


IV. Numerical examples 

Mathematical optimization typically manifests itself in signal processing applications as either a design tool for 
optimal parameter selection or a processing stage in the signal chain itself. In this section we provide context and 
commentary for examples of these types depicted in Figs. 2 and 3. In addition, we present a third and final example 
related to error correction in transform coding theory solved using O-SPC. The obtained solutions agree with those 
generated by CVX [17]. We conclude with a discussion of the relationships between the specific signal processing 
systems associated with the examples. 

A. Sparse signal recovery 

A well-established approach to recovering a sparse signal measured through an underdetermined linear system 
that has potentially been corrupted by noise is to solve the LASSO or basis pursuit denoising problem. In particular, 
this recovery formulation is posed as a regularized least squares problem of the form 

minimize f ||Ax — y|||-f ||x||i (9) 

where A € is the linear measurement system, y € M'" is a vector of measurements, p > 0 scales the objective 

function, and x € M” is the desired sparse vector. We draw A at random from a Gaussian ensemble to ensure it 
satisfies fhe restricted isometry property with high probability [16]. The solution depicted in Fig. 3, solved using 
O-SPC, corresponds to (m, n) = (60,128). Problems sizes of the order (m,n) = (2400,5120) were additionally 
solved, i.e. where A has « 12 million non-zero entries. 

B. Non-negative least squares 

The non-negative least squares problem, which is commonly used as a subroutine in solving more general non¬ 
negative tensor factorization problems, is formulated as the quadratic program 

minimize i||Ax —y||| s.t. x > 0 


( 10 ) 























Fig. 4. An illustration of the computed solution of (11) obtained using 50 distributed workers implementing an iterative filtered solver with 
parameter p = 0.75. A breakdown of the workers computational platform distribution and individual contributions to the overall iteration 
count is also depicted. 


where A £ a general linear system, y £ is a vector of observations, and x £ M" is the desired 

non-negative vector. For the example depicted in Fig. 2, m and n were respectively selected to be 128 and 60. 
Constrained least squares problems such as (10) have immediate application to system design in a number of ways. 
For example, in the context of filter design, (10) facilitates the design of filters including peak-constrained least 
squares filters [18] with additional non-negativity constraints on the filter taps enabling their use on implementation 
technologies with unsigned number systems. 

C. Error correction decoding 

Let A £ denote a linear codebook, i.e. with each column of A denoting a codeword, and consider the 

recovery of a plaintext vector x £ M” from a cyphertext vector y £ which has been additively corrupted by a 
p-sparse noise vector z according to y = Ax + z. We cast the recovery procedure as the problem 

minimize — ?/ L , (11) 

hence decoding a given cyphertext vector in this way corresponds to solving a linear program since (11) may 
be recast as the standard basis pursuit problem. Furthermore, (11) is guaranteed to identify the correct plaintext 
vector X* so long as A and the triple (n, m,p) satisfy the conditions provided in [19]. Figure 4 depicts the solution 
obtained from the numerical experiment outlined in [19] using the distributed iterative filtered solver presented in 
this paper where we specifically selecf fhe fransmiffed plainfexf vector fo be binary valued and round fhe decoded 
plainfexf vecfor for furfher noise suppression. The solver was implemenfed using 50 disfribufed workers. Nofe fhaf 
fhe plainfexf vecfor obfained using (11) is indeed fhe synfhefic plainfexf vecfor before transmission. 

D. Comments on the example signal processing systems 

The optimization problems (9)-(ll) were specifically chosen fo underscore fhe flexibilify and generalify of fhe 
framework in [5] wifh respecf fo fhe implemenfafion paradigm discussed in fhis paper. In particular, for fhe same 
linear system A and observafion vecfor y, fhe signal processing system associated wifh fhese fhree problems differ 
only in fhe analyfic form of fhe nonlinearify m(-) used in defining fhe fransformed sfafionarify conditions (2). The 
coordinafewise nonlinearify : M —>• M associated wifh fhe sparse signal recovery problem (9) is given by 

/ \ f “4;, |x| < 1 

= \ I - 2 sig„M, |i|>l ' 

whereas fhe coordinafewise nonlinearifies m(io) : M —> R and m(n) : R —> R respectively associafed wifh fhe non- 
negafive leasf squares problem (10) and fhe error correction decoding problem (11) are given by m(io)(®) = \x\ and 
771(1 i)(x) = 'ni(p){—x). Each of fhese nonlinearities (as scalar operators or stacked into an operator from R^ into 
itself) satisfy the sufficient condition for convergence in (4) and thus, for example, the filtered solvers discussed 
in Subsections III-A and III-B may be directly utilized to solve the corresponding problems. We conclude with a 
remark on the similarity of the complexity associated with solving (9) and (11) in the sense of identifying fixed- 
points of the algebraic system (2) due in part to the relationship between m( 9 ) and ?77(ii). This similarity may not 
be readily apparent from the optimization problem statements since (11) is a linear program while (9) is convex 
quadratic, but can be leveraged to efficiently solve both problem instances without replicating the entire problem 
in the database. 
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